Trying out the Transformer dataset class from Pylearn2 with our current dataset class as raw, should be able to make a block to apply to it using one of our processing functions that will produce a random combination of it's processing functions.

Setting up

Loading the data and the model, the classic loosely AlexNet based model we've been using for a while.


In [1]:
import pylearn2.utils
import pylearn2.config
import theano
import neukrill_net.dense_dataset
import neukrill_net.utils
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import holoviews as hl
%load_ext holoviews.ipython
import sklearn.metrics


Using gpu device 0: Tesla K40c
:0: FutureWarning: IPython widgets are experimental and may change in the future.
Welcome to the HoloViews IPython extension! (http://ioam.github.io/holoviews/)
Available magics: %compositor, %opts, %params, %view, %%labels, %%opts, %%view
<matplotlib.figure.Figure at 0x7f66024d8850>
<matplotlib.figure.Figure at 0x7f66024d8f10>
<matplotlib.figure.Figure at 0x7f66024d8d10>

In [2]:
cd ..


/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-work

In [3]:
settings = neukrill_net.utils.Settings("settings.json")
run_settings = neukrill_net.utils.load_run_settings(
    "run_settings/alexnet_based.json", settings, force=True)

In [4]:
# loading the model
model = pylearn2.utils.serial.load(run_settings['pickle abspath'])
# loading the data
dataset = neukrill_net.dense_dataset.DensePNGDataset(settings_path=run_settings['settings_path'],
                                            run_settings=run_settings['run_settings_path'],
                                                     train_or_predict='train',
                                                     training_set_mode='validation', force=True)


(3026,)

Making a Block

Pylearn2 uses Blocks to apply the processing functions to the raw data. The Transformer class appears to be able to also take model pickle files as transforms. Hopefully, we can just make an object that inherits the Block base class and supply it with a function to transform an image and that might work. The documentation doesn't really say one way or the other.


In [5]:
import pylearn2.blocks
import neukrill_net.image_processing

In [6]:
b = pylearn2.blocks.Block()

In [7]:
b.fn = lambda x: neukrill_net.image_processing.flip_image(x,flip_x=True)

In [8]:
t = dataset.get_topological_view(dataset.X[:1,:])

In [9]:
t.shape


Out[9]:
(1, 48, 48, 1)

In [10]:
%opts Image style(cmap='gray')
i = hl.Image(t.reshape(t.shape[1:3]))
i


Out[10]:

In [11]:
import pdb

In [12]:
class SampleAugment(pylearn2.blocks.Block):
    def __init__(self,fn,target_shape):
        self._fn = fn
        self.cpu_only=False
        self.target_shape = target_shape
    def __call__(self,inputs):
        return self.fn(inputs)
    def fn(self,inputs):
        # prepare empty array same size as inputs
        req = inputs.shape
        sh = [inputs.shape[0]] + list(self.target_shape)
        inputs = inputs.reshape(sh)
        processed = np.zeros(sh)
        # hand each image as a 2D array
        for i in range(inputs.shape[0]):
            processed[i] = self._fn(inputs[i].reshape(self.target_shape))
        processed = processed.reshape(req)
        return processed

In [13]:
b = SampleAugment(lambda x: neukrill_net.image_processing.flip_image(x,flip_x=True),(48,48))

In [14]:
hl.Image(b(t).reshape(t.shape[1:3]))


Out[14]:

So it flips images like it's supposed to. Now we can try to make a TransformerDataset using it:


In [15]:
# want to make sure the processing is obvious
b = SampleAugment(lambda x: np.zeros(x.shape),(48,48))

In [16]:
import pylearn2.datasets.transformer_dataset

In [17]:
tdataset = pylearn2.datasets.transformer_dataset.TransformerDataset(dataset,b,
                                                                    space_preserving=True)

In [18]:
hl.Image(dataset.get_batch_topo(1).reshape(t.shape[1:3]))


Out[18]:

In [19]:
hl.Image(tdataset.get_batch_topo(1).reshape(t.shape[1:3]))


Out[19]:

Should be possible to hack together from here. Making a transformer dataset that takes a stochastic processing function in a block; sampling from a set of possible augmentations and applying them to the image.

Stupid Transformer

The stupid transformer takes a dataset after preprocessing; after the dataset has been resized and normalised into a homogeneous numpy array. It then applies its processing function to each of the examples in the array when a batch is requested.

This is pretty easy to make; in fact we've pretty much done it above. All we need is a stochastic augmentation function that will apply a random augmentation to the images supplied each time. Then, we'll have a potentially massive dataset.


In [20]:
import neukrill_net.augment

In [21]:
reload(neukrill_net.augment)


Out[21]:
<module 'neukrill_net.augment' from '/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-tools/neukrill_net/augment.pyc'>

In [22]:
import neukrill_net.blocks
reload(neukrill_net.blocks)


Out[22]:
<module 'neukrill_net.blocks' from '/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-tools/neukrill_net/blocks.pyc'>

In [23]:
fn = neukrill_net.augment.RandomAugment(**{"units":"float64",
                                           "rotate":-1,
                                           "flip":1,
                                           "rotate_is_resizable":0,
                                           "shear":[0,np.pi/4,np.pi/2],
                                           "crop":[0.05,0.1,0.2],
                                           "noise":0.001,
                                           "scale":[0.9,1.0,1.1,1.5],
                                           "resize":(48,48)
                                           })

In [24]:
t.squeeze().shape


Out[24]:
(48, 48)

In [25]:
hl.Image(t.squeeze())


Out[25]:

In [26]:
hl.Image(fn(t.squeeze()))


Out[26]:

In [27]:
b = neukrill_net.blocks.SampleAugment(lambda x: fn(x),(48,48),(48,48))

In [28]:
tdataset = pylearn2.datasets.transformer_dataset.TransformerDataset(raw=dataset,transformer=b,
                                                                   space_preserving=True)

In [29]:
reload(neukrill_net.image_processing)


Out[29]:
<module 'neukrill_net.image_processing' from '/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-tools/neukrill_net/image_processing.pyc'>

In [30]:
tdataset.get_batch_topo(2).shape


Out[30]:
(2, 48, 48, 1)

In [31]:
hl.Image(tdataset.get_batch_topo(1).reshape((48,48)))


Out[31]:

In [32]:
tdataset.get_num_examples()


Out[32]:
3026

In [33]:
batch_size = 128
num_batches = int(tdataset.get_num_examples()/batch_size)

Had to make some modifications to the Pylearn2 code to make this work:


In [34]:
import pylearn2.utils.iteration
reload(pylearn2.utils.iteration)


Out[34]:
<module 'pylearn2.utils.iteration' from '/afs/inf.ed.ac.uk/user/s08/s0805516/repos/pylearn2/pylearn2/utils/iteration.pyc'>

Iterator gets called during the SGD train loop, specifically on lines 445-464 in pylearn2/training_algorithms/sgd.py:

iterator = dataset.iterator(mode=self.train_iteration_mode,
                                    batch_size=self.batch_size,
                                    data_specs=flat_data_specs,
                                    return_tuple=True, rng=rng,
                                    num_batches=self.batches_per_iter)

        on_load_batch = self.on_load_batch
        for batch in iterator:
            for callback in on_load_batch:
                callback(*batch)
            self.sgd_update(*batch)
            # iterator might return a smaller batch if dataset size
            # isn't divisible by batch_size
            # Note: if data_specs[0] is a NullSpace, there is no way to know
            # how many examples would actually have been in the batch,
            # since it was empty, so actual_batch_size would be reported as 0.
            actual_batch_size = flat_data_specs[0].np_batch_size(batch)
            self.monitor.report_batch(actual_batch_size)
            for callback in self.update_callbacks:
                callback(self)

So we have to call the iterator the same way, specifically getting whatever flat_data_specs right.


In [35]:
from pylearn2.space import CompositeSpace

In [36]:
from pylearn2.utils.data_specs import DataSpecsMapping

In [37]:
data_specs = (model.get_input_space(),model.get_input_source())

In [38]:
mapping = DataSpecsMapping(data_specs)

In [39]:
space_tuple = mapping.flatten(data_specs[0], return_tuple=True)
source_tuple =  mapping.flatten(data_specs[1], return_tuple=True)

In [40]:
flat_data_specs = (CompositeSpace(space_tuple), source_tuple)

In [41]:
iterator = tdataset.iterator(mode='random_slice', data_specs=flat_data_specs,
                             batch_size=batch_size,num_batches=num_batches)


/afs/inf.ed.ac.uk/user/s08/s0805516/repos/pylearn2/pylearn2/utils/iteration.py:783: UserWarning: dataset is using the old iterator interface which is deprecated and will become officially unsupported as of July 28, 2015. The dataset should implement a `get` method respecting the new interface.
  warnings.warn("dataset is using the old iterator interface which "

In [42]:
%pdb


Automatic pdb calling has been turned ON

In [43]:
iterator.next().shape


Out[43]:
(1, 48, 48, 128)

So the iterators can't actually produce examples? In that case, what is it actually training on? Maybe it's failing over to the raw iterator silently? Would explain the lack of difference in actual performance.


In [44]:
iterator.raw_iterator.next().shape


Out[44]:
(1, 48, 48, 128)

In [45]:
iterator.num_examples


Out[45]:
2944

Smart Transformer

The big problem with the dataset before is that these transformations have to occur after resizing, normalisation and loading all the images into this big numpy array. We might be able to hack our way round this by loading the images unprocessed into a very large numpy array and padding the spare area around most of the images with an indicator number; then shaving this off before augmentation and homogenisation back down to whatever size we're aiming for.

It would be much better if the transformer dataset had a stochastic function which it applied whenever it needed a batch to a set of raw images held in memory. To make this, first going to try to create a dummy raw dataset that simply loads the raw images as a list of numpy arrays and supports the expected interface that the Transformer class will be looking for. Then, we just need to initialise our Block class with a processing function that can support processing from raw images.


In [46]:
import pylearn2.datasets

In [47]:
# don't have to think too hard about how to write this:
# https://stackoverflow.com/questions/19151/build-a-basic-python-iterator
class FlyIterator(object):
    """
    Simple iterator class to take a dataset and iterate over
    it's contents applying a processing function. Assuming
    the dataset has a processing function to apply.
    
    It may have an issue of there being some leftover examples
    that will never be shown on any epoch. Can avoid this by
    seeding with sampled numbers from the dataset's own rng.
    """
    def __init__(self, dataset, batch_size, num_batches,
                 final_shape, seed=42):
        self.dataset = dataset
        self.batch_size = batch_size
        self.num_batches = num_batches
        self.final_shape = final_shape
        # initialise rng
        self.rng = np.random.RandomState(seed=seed)
        # shuffle indices of size equal to number of examples
        # in dataset
        N = self.dataset.get_num_examples()
        self.indices = range(N)
        self.rng.shuffle(self.indices)
        
    def __iter__(self):
        return self
    
    def next(self):
        # return one batch
        batch_indices = [self.indices.pop() for i in range(batch_size)]
        # preallocate array
        if len(self.final_shape) == 2: 
            batch = np.zeros([batch_size]+list(self.final_shape)+[1])
        elif len(self.final_shape) == 3:
            batch = np.zeros([batch_size]+list(self.final_shape))
        # iterate over indices, applying the dataset's processing function
        for i,j in enumerate(batch_indices):
            batch[i] = self.dataset.fn(self.dataset.X[j]).reshape(batch.shape[1:])
        return batch

In [48]:
class ListDataset(pylearn2.datasets.dataset.Dataset):
    """
    Loads images as raw numpy arrays in a list, tries 
    its best to respect the interface expected of a 
    Pylearn2 Dataset.
    """
    def __init__(self, transformer, settings_path="settings.json", 
                 run_settings_path="run_settings/alexnet_based.json",
                 verbose=False, force=False, seed=42):
        """
        Loads the images as a list of differently shaped
        numpy arrays and loads the labels as a vector of 
        integers, mapped deterministically.
        """
        self.fn = transformer
        # load settings
        self.settings = neukrill_net.utils.Settings(settings_path)
        self.run_settings = neukrill_net.utils.load_run_settings(run_settings_path,
                                                                 self.settings,
                                                                 force=force)
        self.X, labels = neukrill_net.utils.load_rawdata(self.settings.image_fnames,
                                                 classes=self.settings.classes,
                                                 verbose=verbose)
        # transform labels from strings to integers
        class_dictionary = {}
        for i,c in enumerate(self.settings.classes):
            class_dictionary[c] = i
        self.y = np.array(map(lambda c: class_dictionary[c],labels))
        
        # set up the random state
        self.rng = np.random.RandomState(seed)
        
        # shuffle a list of image indices
        self.N = len(self.X)
        self.indices = range(self.N)
        self.rng.shuffle(self.indices)
        
    def iterator(self, mode=None, batch_size=None, num_batches=None, rng=None,
                        data_specs=None, return_tuple=False):
        """
        Returns iterator object with standard Pythonic interface; iterates
        over the dataset over batches, popping off batches from a shuffled 
        list of indices.
        """
        if not num_batches:
            # guess that we want to use all of them
            num_batches = int(len(dataset.X)/batch_size)
        iterator = FlyIterator(dataset=self, batch_size=batch_size, 
                        num_batches=num_batches, final_shape=run_settings["final_shape"],
                               seed=self.rng.random_integers(low=0, high=256))
        return iterator
        
    def adjust_to_be_viewed_with():
        raise NotImplementedError("Didn't think this was important, so didn't write it.")
    
    def get_batch_design(self, batch_size, include_labels=False):
        """
        Will return a list of the size batch_size of carefully raveled arrays.
        Optionally, will also include labels (using include_labels).
        """
        selection = self.rng.random_integers(0,high=self.N,size=batch_size)
        batch = [self.X[s].ravel() for s in selection]
        return batch
        
    def get_batch_topo(self, batch_size, include_labels=False):
        """
        Will return a list of the size batch_size of raw, unfiltered, artisan
        numpy arrays. Optionally, will also include labels (using include_labels).
        
        Strongly discouraged to use this method for learning code, so I guess 
        this isn't so important?
        """
        selection = self.rng.random_integers(0,high=self.N,size=batch_size)
        batch = [self.X[s] for s in selection]
        return batch
        
    def get_num_examples(self):
        return self.N
        
    def get_topological_view():
        raise NotImplementedError("Not written yet, not sure we need it")
        
    def get_weights_view():
        raise NotImplementedError("Not written yet, didn't think it was important")
        
    def has_targets(self):
        if self.y:
            return True
        else:
            return False

In [49]:
lset = ListDataset(fn,force=True)

In [50]:
i = lset.iterator(batch_size=128)

In [51]:
for b in i:
    print(b.shape)
    t = b
    break


(128, 48, 48, 1)

In [52]:
hl.Image(t[1,:].squeeze())


Out[52]:

OK, so we've written a dataset that has an iterator that follows the standard Python conventions. Now, all we need to do is get Pylearn2 to accept this dataset. Easiest way to do this is to write it into a YAML file and run a training script. Writing the above into modules in our codebase and using the following YAML file:


In [53]:
!cat yaml_templates/alexnet_based_listdataset.yaml


!obj:pylearn2.train.Train {
    dataset: &train !obj:neukrill_net.image_directory_dataset.ListDataset {
        transformer: !obj:neukrill_net.augment.RandomAugment {
                units: 'float',
                rotate: -1,
                rotate_is_resizable: 0,
                flip: 1,
                resize: %(final_shape)s,
                shunt: 0.075,
                shear: 5
            },
        settings_path: %(settings_path)s,
        run_settings_path: %(run_settings_path)s
    },
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: &batch_size 128,
        input_space: !obj:pylearn2.space.Conv2DSpace {
            shape: %(final_shape)s,
            num_channels: 1,
            axes: ['b', 0, 1, 'c'],
        },
        layers: [ !obj:pylearn2.models.mlp.ConvRectifiedLinear {
                     layer_name: h1,
                     output_channels: 48,
                     irange: .025,
                     init_bias: 0,
                     kernel_shape: [8, 8],
                     pool_shape: [2, 2],
                     pool_stride: [2, 2],
                     max_kernel_norm: 1.9365
                 },!obj:pylearn2.models.mlp.ConvRectifiedLinear {
                     layer_name: h2,
                     output_channels: 96,
                     irange: .025,
                     init_bias: 1,
                     kernel_shape: [5, 5],
                     pool_shape: [2, 2],
                     pool_stride: [2, 2],
                     max_kernel_norm: 1.9365
                 }, !obj:pylearn2.models.mlp.ConvRectifiedLinear {
                     layer_name: h3,
                     output_channels: 128,
                     irange: .025,
                     init_bias: 0,
                     kernel_shape: [3, 3],
                     border_mode: full,
                     pool_shape: [1, 1],
                     pool_stride: [1, 1],
                     max_kernel_norm: 1.9365
                 }, !obj:pylearn2.models.mlp.ConvRectifiedLinear {
                     layer_name: 'h4',
                     output_channels: 128,
                     irange: .025,
                     init_bias: 1,
                     kernel_shape: [3, 3],
                     border_mode: full,
                     pool_shape: [2, 2],
                     pool_stride: [2, 2],
                     max_kernel_norm: 1.9365
                 }, !obj:pylearn2.models.mlp.RectifiedLinear {
                     dim: 1024,
                     max_col_norm: 1.9,
                     layer_name: h5,
                     istdev: .05,
                     W_lr_scale: .25,
                     b_lr_scale: .25
                 }, !obj:pylearn2.models.mlp.Softmax {
                     n_classes: %(n_classes)i,
                     max_col_norm: 1.9365,
                     layer_name: y,
                     istdev: .05,
                     W_lr_scale: .25,
                     b_lr_scale: .25
                 }
                ],
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        train_iteration_mode: even_shuffled_sequential,
        monitor_iteration_mode: even_sequential,
        batch_size: *batch_size,
        learning_rate: .1,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: 0.5
        },
        monitoring_dataset: {
                'train': *train,
                'valid' : !obj:neukrill_net.dense_dataset.DensePNGDataset  {
                                settings_path: %(settings_path)s,
                                run_settings: %(run_settings_path)s,
                                training_set_mode: "validation"
            },
        },
        cost: !obj:pylearn2.costs.cost.SumOfCosts { costs: [ 
            !obj:pylearn2.costs.mlp.dropout.Dropout {
                input_include_probs: {
                    h1 : 1.,
                    h2 : 1.,
                    h3 : 1.,
                    h4 : 1.,
                    h5 : 0.5
                },
                input_scales: {
                    h1 : 1.,
                    h2 : 1.,
                    h3 : 1.,
                    h4 : 1.,
                    h5 : 2.
                }
             },
             !obj:pylearn2.costs.mlp.WeightDecay {
                 coeffs : {
                     h1 : .00005,
                     h2 : .00005,
                     h3 : .00005,
                     h4 : .00005,
                     h5 : .00005
                 }
             }
             ]
        },
        termination_criterion: !obj:pylearn2.termination_criteria.And {
            criteria: [
                !obj:pylearn2.termination_criteria.EpochCounter {
                    max_epochs: 500
                },
            ]
        }
    },
    extensions: [
        !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
            start: 1,
            saturate: 250,
            final_momentum: 0.95
        },
        !obj:pylearn2.training_algorithms.sgd.LinearDecayOverEpoch {
            start: 1,
            saturate: 250,
            decay_factor: 0.025
        },
        !obj:pylearn2.train_extensions.best_params.MonitorBasedSaveBest {
             channel_name: valid_y_misclass,
             save_path: '%(save_path)s'
        },
        !obj:pylearn2.training_algorithms.sgd.MonitorBasedLRAdjuster {
            high_trigger: 1.,
            low_trigger: 0.999,
            grow_amt: 1.02,
            shrink_amt: 0.98,
            max_lr: 0.4,
            min_lr: 1e-5,
            channel_name: valid_y_misclass
        }
    ],
}

Ran the model (many times) updating the code until it would work. Now the above YAML will train.

Bugs?

We might have bugs in the ListDataset as when running it with the same augmentations as used in the traditional methods we don't see anywhere near the same performance increases. In fact, it was barely able to learn at all, so it's probably broken somehow. As it can't learn at all, it might be garbling the Images, producing batches that don't correspond to the targets.

The following code checks that we don't have problems with our random number generators:


In [54]:
import neukrill_net.image_directory_dataset

In [55]:
fn = neukrill_net.augment.RandomAugment(**{"units":"float64",
                                           "rotate":[0,90,180,270],
                                           "flip":1,
                                           "rotate_is_resizable":0,
                                           "normalise":{"global_or_pixel":"global",
                                "mu":0.95727,"sigma":0.1423},
                                           "resize":(48,48)
                                           })
fn2 = neukrill_net.augment.RandomAugment(**{"units":"float64",
                                           "rotate":[0,90,180,270],
                                           "flip":1,
                                           "rotate_is_resizable":0,
                                           "normalise":{"global_or_pixel":"global",
                                "mu":0.95727,"sigma":0.1423},
                                           "resize":(48,48)
                                           })

In [56]:
dataset = neukrill_net.image_directory_dataset.ListDataset(fn, force=True)
dataset2 = neukrill_net.image_directory_dataset.ListDataset(fn2, force=True)

In [57]:
iterator = dataset.iterator(batch_size=128)
iterator2 = dataset2.iterator(batch_size=128)


---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-57-421f6c05f8eb> in <module>()
----> 1 iterator = dataset.iterator(batch_size=128)
      2 iterator2 = dataset2.iterator(batch_size=128)

/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-tools/neukrill_net/image_directory_dataset.pyc in iterator(self, mode, batch_size, num_batches, rng, data_specs, return_tuple)
    163                                 num_batches=num_batches,
    164                                 final_shape=self.run_settings["final_shape"],
--> 165                                 rng=self.rng, mode=mode)
    166         return iterator
    167 

/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-tools/neukrill_net/image_directory_dataset.pyc in __init__(self, dataset, batch_size, num_batches, final_shape, rng, mode)
     51             self.rng.shuffle(self.indices)
     52         else:
---> 53             assert mode == 'even_sequential'
     54         # have to add this for checks during training
     55         # bit of a lie

AssertionError: 
> /afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-tools/neukrill_net/image_directory_dataset.py(53)__init__()
     52         else:
---> 53             assert mode == 'even_sequential'
     54         # have to add this for checks during training

ipdb> exit

In [ ]:
for a,b in zip(iterator,iterator2):
    if (not np.allclose(a[0],b[0]) or not np.allclose(a[1],b[1])):
        print("Shit.")

In [ ]:
not np.allclose(a[0],b[0])

In [ ]:
not np.allclose(a[0],b[0]) or not np.allclose(a[1],b[1])

Checking Against Dense

The DensePNGDataset doesn't appear to have the problems we're seeing with the ListDataset. If we load the exact same processing in both and iterate over the minibatches sequentially we should see exactly the same minibatches being produced.


In [ ]:
reload(neukrill_net.image_directory_dataset)

In [ ]:
dense = neukrill_net.dense_dataset.DensePNGDataset(
                            run_settings=run_settings['run_settings_path'],
                                        force=True, verbose=True)

In [ ]:
run_settings['run_settings_path']

In [ ]:
fn = neukrill_net.augment.RandomAugment(**{"units":"float64",
                                           "normalise":{"global_or_pixel":"global",
                                                "mu":0.95727,"sigma":0.1423},
                                           "resize":(48,48)})

In [ ]:
lists = neukrill_net.image_directory_dataset.ListDataset(fn,
                    run_settings_path=run_settings['run_settings_path'], force=True)

In [ ]:
run_settings = neukrill_net.utils.load_run_settings(
            "run_settings/alexnet_based_runtest.json",settings,
                                                   force=True)
ystring = neukrill_net.utils.format_yaml(run_settings,settings)
train = pylearn2.config.yaml_parse.load(ystring)

In [ ]:
data_specs = train.algorithm.cost.get_data_specs(train.model)

In [ ]:
mapping = DataSpecsMapping(data_specs)
space_tuple = mapping.flatten(data_specs[0], return_tuple=True)
source_tuple =  mapping.flatten(data_specs[1], return_tuple=True)
flat_data_specs = (CompositeSpace(space_tuple), source_tuple)

In [ ]:
dense_iterator = dense.iterator(mode='even_sequential',batch_size=128,
                                data_specs=flat_data_specs,return_tuple=True)

In [ ]:
list_iterator = lists.iterator(mode='sequential',batch_size=128)

In [ ]:
a = dense_iterator.next()
b = list_iterator.next()

In [ ]:
print(a[0].shape,a[1].shape)

In [ ]:
print(b[0].shape,b[1].shape)

Recent tests indicate it is working after all, so I'm neglecting these tests.

Hierarchical Models

We want to be able to represent some of taxonomic tree information in the labels, in order to hopefully propagate some more useful information from these additional labels. This amounts to having multiple softmax layers in our output layer. These are wrapped in a FlattenerLayer so expect to see a big n-of-k encoded vector indicator the class and superclasses as targets for every data point.

Unfortunately, we've got some bugs with how this is working, so going to look at how to debug these.


In [ ]:
import neukrill_net.image_directory_dataset

In [ ]:
dataset = neukrill_net.image_directory_dataset.ListDataset(transformer=fn,
    run_settings_path="run_settings/alexnet_based_extra_convlayer_with_superclasses.json",
                                                           force=True)

In [82]:
import neukrill_net.encoding

In [168]:
hier = neukrill_net.encoding.get_hierarchy()

In [169]:
hier


Out[169]:
[['acantharia_protist',
  'acantharia_protist_big_center',
  'acantharia_protist_halo',
  'amphipods',
  'appendicularian_fritillaridae',
  'appendicularian_s_shape',
  'appendicularian_slight_curve',
  'appendicularian_straight',
  'artifacts',
  'artifacts_edge',
  'chaetognath_non_sagitta',
  'chaetognath_other',
  'chaetognath_sagitta',
  'chordate_type1',
  'copepod_calanoid',
  'copepod_calanoid_eggs',
  'copepod_calanoid_eucalanus',
  'copepod_calanoid_flatheads',
  'copepod_calanoid_frillyAntennae',
  'copepod_calanoid_large',
  'copepod_calanoid_large_side_antennatucked',
  'copepod_calanoid_octomoms',
  'copepod_calanoid_small_longantennae',
  'copepod_cyclopoid_copilia',
  'copepod_cyclopoid_oithona',
  'copepod_cyclopoid_oithona_eggs',
  'copepod_other',
  'crustacean_other',
  'ctenophore_cestid',
  'ctenophore_cydippid_no_tentacles',
  'ctenophore_cydippid_tentacles',
  'ctenophore_lobate',
  'decapods',
  'detritus_blob',
  'detritus_filamentous',
  'detritus_other',
  'diatom_chain_string',
  'diatom_chain_tube',
  'echinoderm_larva_pluteus_brittlestar',
  'echinoderm_larva_pluteus_early',
  'echinoderm_larva_pluteus_typeC',
  'echinoderm_larva_pluteus_urchin',
  'echinoderm_larva_seastar_bipinnaria',
  'echinoderm_larva_seastar_brachiolaria',
  'echinoderm_seacucumber_auricularia_larva',
  'echinopluteus',
  'ephyra',
  'euphausiids',
  'euphausiids_young',
  'fecal_pellet',
  'fish_larvae_deep_body',
  'fish_larvae_leptocephali',
  'fish_larvae_medium_body',
  'fish_larvae_myctophids',
  'fish_larvae_thin_body',
  'fish_larvae_very_thin_body',
  'heteropod',
  'hydromedusae_aglaura',
  'hydromedusae_bell_and_tentacles',
  'hydromedusae_h15',
  'hydromedusae_haliscera',
  'hydromedusae_haliscera_small_sideview',
  'hydromedusae_liriope',
  'hydromedusae_narco_dark',
  'hydromedusae_narco_young',
  'hydromedusae_narcomedusae',
  'hydromedusae_other',
  'hydromedusae_partial_dark',
  'hydromedusae_shapeA',
  'hydromedusae_shapeA_sideview_small',
  'hydromedusae_shapeB',
  'hydromedusae_sideview_big',
  'hydromedusae_solmaris',
  'hydromedusae_solmundella',
  'hydromedusae_typeD',
  'hydromedusae_typeD_bell_and_tentacles',
  'hydromedusae_typeE',
  'hydromedusae_typeF',
  'invertebrate_larvae_other_A',
  'invertebrate_larvae_other_B',
  'jellies_tentacles',
  'polychaete',
  'protist_dark_center',
  'protist_fuzzy_olive',
  'protist_noctiluca',
  'protist_other',
  'protist_star',
  'pteropod_butterfly',
  'pteropod_theco_dev_seq',
  'pteropod_triangle',
  'radiolarian_chain',
  'radiolarian_colony',
  'shrimp-like_other',
  'shrimp_caridean',
  'shrimp_sergestidae',
  'shrimp_zoea',
  'siphonophore_calycophoran_abylidae',
  'siphonophore_calycophoran_rocketship_adult',
  'siphonophore_calycophoran_rocketship_young',
  'siphonophore_calycophoran_sphaeronectes',
  'siphonophore_calycophoran_sphaeronectes_stem',
  'siphonophore_calycophoran_sphaeronectes_young',
  'siphonophore_other_parts',
  'siphonophore_partial',
  'siphonophore_physonect',
  'siphonophore_physonect_young',
  'stomatopod',
  'tornaria_acorn_worm_larvae',
  'trichodesmium_bowtie',
  'trichodesmium_multiple',
  'trichodesmium_puff',
  'trichodesmium_tuft',
  'trochophore_larvae',
  'tunicate_doliolid',
  'tunicate_doliolid_nurse',
  'tunicate_partial',
  'tunicate_salp',
  'tunicate_salp_chains',
  'unknown_blobs_and_smudges',
  'unknown_sticks',
  'unknown_unclassified'],
 ['acantharia',
  'appendicularians',
  'calanoid',
  'calycophoran_siphonophores',
  'chaetognaths',
  'crustaceans',
  'ctenophores',
  'cyclopoid_copepods',
  'cydippid',
  'decapods_all',
  'detritus',
  'diatoms',
  'echinoderm',
  'euphausiids_all_ages',
  'fish',
  'gastropods',
  'gelatinous zooplankton',
  'no_class',
  'oithona',
  'other_hydromedusae',
  'other_invert_larvae',
  'physonect',
  'plankton',
  'pluteus',
  'protists',
  'pteropods',
  'radiolarian',
  'rocketship',
  'seastar',
  'shrimp_like',
  'siphonophores',
  'sphaeronectes',
  'sub_hydromedusae1',
  'sub_hydromedusae2',
  'sub_protists',
  'trichodesmium',
  'tunicate',
  'unknown'],
 ['calycophoran_siphonophores',
  'copepods',
  'crustaceans',
  'ctenophores',
  'cyclopoid_copepods',
  'echinoderm',
  'gastropods',
  'gelatinous zooplankton',
  'hydromedusae',
  'no_class',
  'other_invert_larvae',
  'pelagic_tunicates',
  'plankton',
  'protists',
  'shrimp_like',
  'siphonophores'],
 ['copepods',
  'crustaceans',
  'gelatinous zooplankton',
  'no_class',
  'other_invert_larvae',
  'plankton',
  'siphonophores'],
 ['crustaceans', 'gelatinous zooplankton', 'no_class', 'plankton'],
 ['no_class', 'plankton']]

In [170]:
l = sum([1 for a in hier for b in a])

In [171]:
l


Out[171]:
188

In [172]:
sum([len(a) for a in hier])


Out[172]:
188

In [173]:
x = settings.classes[1]

In [174]:
class_dictionary = {}
for i,c in enumerate(settings.classes):
    class_dictionary[c] = i

In [175]:
conflicted = []
for x in settings.classes:
    v1 = np.array(neukrill_net.encoding.get_encoding(x,hier)[0])
    v2 = np.zeros(len(settings.classes))
    v2[class_dictionary[x]] = 1
    if not np.allclose(v1,v2):
        print(x)
        conflicted.append(x)


appendicularian_slight_curve
appendicularian_s_shape
hydromedusae_narcomedusae
hydromedusae_narco_young
shrimp_caridean
shrimp-like_other

In [176]:
hier[0] = [str(c) for c in settings.classes]

In [177]:
for x in settings.classes:
    v1 = np.array(neukrill_net.encoding.get_encoding(x,hier)[0])
    v2 = np.zeros(len(settings.classes))
    v2[class_dictionary[x]] = 1
    if not np.allclose(v1,v2):
        print(x)

In [178]:
[np.where(np.array(a)==1)[0][0] for a in neukrill_net.encoding.get_encoding(x,hier)]


Out[178]:
[120, 37, 12, 5, 3, 1]

In [179]:
class_dictionary = {}
for c in hier[0]:
    class_dictionary[c] = np.where(np.array([a 
                           for l in neukrill_net.encoding.get_encoding(c,hier)
                           for a in l])==1)[0]

In [180]:
class_dictionary


Out[180]:
{'acantharia_protist': array([  0, 121, 172, 180, 185, 187]),
 'acantharia_protist_big_center': array([  1, 121, 172, 180, 185, 187]),
 'acantharia_protist_halo': array([  2, 121, 172, 180, 185, 187]),
 'amphipods': array([  3, 126, 161, 176, 182, 187]),
 'appendicularian_fritillaridae': array([  4, 122, 170, 177, 183, 187]),
 'appendicularian_s_shape': array([  6, 122, 170, 177, 183, 187]),
 'appendicularian_slight_curve': array([  5, 122, 170, 177, 183, 187]),
 'appendicularian_straight': array([  7, 122, 170, 177, 183, 187]),
 'artifacts': array([  8, 138, 168, 178, 184, 186]),
 'artifacts_edge': array([  9, 138, 168, 178, 184, 186]),
 'chaetognath_non_sagitta': array([ 10, 125, 171, 180, 185, 187]),
 'chaetognath_other': array([ 11, 125, 171, 180, 185, 187]),
 'chaetognath_sagitta': array([ 12, 125, 171, 180, 185, 187]),
 'chordate_type1': array([ 13, 143, 171, 180, 185, 187]),
 'copepod_calanoid': array([ 14, 123, 160, 175, 182, 187]),
 'copepod_calanoid_eggs': array([ 15, 123, 160, 175, 182, 187]),
 'copepod_calanoid_eucalanus': array([ 16, 123, 160, 175, 182, 187]),
 'copepod_calanoid_flatheads': array([ 17, 123, 160, 175, 182, 187]),
 'copepod_calanoid_frillyAntennae': array([ 18, 123, 160, 175, 182, 187]),
 'copepod_calanoid_large': array([ 19, 123, 160, 175, 182, 187]),
 'copepod_calanoid_large_side_antennatucked': array([ 20, 123, 160, 175, 182, 187]),
 'copepod_calanoid_octomoms': array([ 21, 123, 160, 175, 182, 187]),
 'copepod_calanoid_small_longantennae': array([ 22, 123, 160, 175, 182, 187]),
 'copepod_cyclopoid_copilia': array([ 23, 128, 163, 175, 182, 187]),
 'copepod_cyclopoid_oithona': array([ 24, 139, 163, 175, 182, 187]),
 'copepod_cyclopoid_oithona_eggs': array([ 25, 139, 163, 175, 182, 187]),
 'copepod_other': array([ 26, 123, 160, 175, 182, 187]),
 'crustacean_other': array([ 27, 126, 161, 176, 182, 187]),
 'ctenophore_cestid': array([ 28, 127, 162, 177, 183, 187]),
 'ctenophore_cydippid_no_tentacles': array([ 29, 129, 162, 177, 183, 187]),
 'ctenophore_cydippid_tentacles': array([ 30, 129, 162, 177, 183, 187]),
 'ctenophore_lobate': array([ 31, 127, 162, 177, 183, 187]),
 'decapods': array([ 32, 130, 173, 176, 182, 187]),
 'detritus_blob': array([ 33, 131, 171, 180, 185, 187]),
 'detritus_filamentous': array([ 34, 131, 171, 180, 185, 187]),
 'detritus_other': array([ 35, 131, 171, 180, 185, 187]),
 'diatom_chain_string': array([ 36, 132, 171, 180, 185, 187]),
 'diatom_chain_tube': array([ 37, 132, 171, 180, 185, 187]),
 'echinoderm_larva_pluteus_brittlestar': array([ 38, 144, 164, 179, 185, 187]),
 'echinoderm_larva_pluteus_early': array([ 39, 144, 164, 179, 185, 187]),
 'echinoderm_larva_pluteus_typeC': array([ 40, 144, 164, 179, 185, 187]),
 'echinoderm_larva_pluteus_urchin': array([ 41, 144, 164, 179, 185, 187]),
 'echinoderm_larva_seastar_bipinnaria': array([ 42, 149, 164, 179, 185, 187]),
 'echinoderm_larva_seastar_brachiolaria': array([ 43, 149, 164, 179, 185, 187]),
 'echinoderm_seacucumber_auricularia_larva': array([ 44, 133, 164, 179, 185, 187]),
 'echinopluteus': array([ 45, 144, 164, 179, 185, 187]),
 'ephyra': array([ 46, 137, 166, 177, 183, 187]),
 'euphausiids': array([ 47, 134, 173, 176, 182, 187]),
 'euphausiids_young': array([ 48, 134, 173, 176, 182, 187]),
 'fecal_pellet': array([ 49, 131, 171, 180, 185, 187]),
 'fish_larvae_deep_body': array([ 50, 135, 171, 180, 185, 187]),
 'fish_larvae_leptocephali': array([ 51, 135, 171, 180, 185, 187]),
 'fish_larvae_medium_body': array([ 52, 135, 171, 180, 185, 187]),
 'fish_larvae_myctophids': array([ 53, 135, 171, 180, 185, 187]),
 'fish_larvae_thin_body': array([ 54, 135, 171, 180, 185, 187]),
 'fish_larvae_very_thin_body': array([ 55, 135, 171, 180, 185, 187]),
 'heteropod': array([ 56, 136, 165, 180, 185, 187]),
 'hydromedusae_aglaura': array([ 57, 153, 167, 177, 183, 187]),
 'hydromedusae_bell_and_tentacles': array([ 58, 140, 167, 177, 183, 187]),
 'hydromedusae_h15': array([ 59, 140, 167, 177, 183, 187]),
 'hydromedusae_haliscera': array([ 60, 153, 167, 177, 183, 187]),
 'hydromedusae_haliscera_small_sideview': array([ 61, 153, 167, 177, 183, 187]),
 'hydromedusae_liriope': array([ 62, 153, 167, 177, 183, 187]),
 'hydromedusae_narco_dark': array([ 63, 154, 167, 177, 183, 187]),
 'hydromedusae_narco_young': array([ 65, 154, 167, 177, 183, 187]),
 'hydromedusae_narcomedusae': array([ 64, 154, 167, 177, 183, 187]),
 'hydromedusae_other': array([ 66, 140, 167, 177, 183, 187]),
 'hydromedusae_partial_dark': array([ 67, 140, 167, 177, 183, 187]),
 'hydromedusae_shapeA': array([ 68, 140, 167, 177, 183, 187]),
 'hydromedusae_shapeA_sideview_small': array([ 69, 140, 167, 177, 183, 187]),
 'hydromedusae_shapeB': array([ 70, 140, 167, 177, 183, 187]),
 'hydromedusae_sideview_big': array([ 71, 140, 167, 177, 183, 187]),
 'hydromedusae_solmaris': array([ 72, 154, 167, 177, 183, 187]),
 'hydromedusae_solmundella': array([ 73, 154, 167, 177, 183, 187]),
 'hydromedusae_typeD': array([ 74, 140, 167, 177, 183, 187]),
 'hydromedusae_typeD_bell_and_tentacles': array([ 75, 140, 167, 177, 183, 187]),
 'hydromedusae_typeE': array([ 76, 140, 167, 177, 183, 187]),
 'hydromedusae_typeF': array([ 77, 140, 167, 177, 183, 187]),
 'invertebrate_larvae_other_A': array([ 78, 141, 169, 179, 185, 187]),
 'invertebrate_larvae_other_B': array([ 79, 141, 169, 179, 185, 187]),
 'jellies_tentacles': array([ 80, 137, 166, 177, 183, 187]),
 'polychaete': array([ 81, 143, 171, 180, 185, 187]),
 'protist_dark_center': array([ 82, 155, 172, 180, 185, 187]),
 'protist_fuzzy_olive': array([ 83, 155, 172, 180, 185, 187]),
 'protist_noctiluca': array([ 84, 145, 172, 180, 185, 187]),
 'protist_other': array([ 85, 155, 172, 180, 185, 187]),
 'protist_star': array([ 86, 155, 172, 180, 185, 187]),
 'pteropod_butterfly': array([ 87, 146, 165, 180, 185, 187]),
 'pteropod_theco_dev_seq': array([ 88, 146, 165, 180, 185, 187]),
 'pteropod_triangle': array([ 89, 146, 165, 180, 185, 187]),
 'radiolarian_chain': array([ 90, 147, 172, 180, 185, 187]),
 'radiolarian_colony': array([ 91, 147, 172, 180, 185, 187]),
 'shrimp-like_other': array([ 93, 150, 173, 176, 182, 187]),
 'shrimp_caridean': array([ 92, 130, 173, 176, 182, 187]),
 'shrimp_sergestidae': array([ 94, 130, 173, 176, 182, 187]),
 'shrimp_zoea': array([ 95, 130, 173, 176, 182, 187]),
 'siphonophore_calycophoran_abylidae': array([ 96, 124, 159, 181, 183, 187]),
 'siphonophore_calycophoran_rocketship_adult': array([ 97, 148, 159, 181, 183, 187]),
 'siphonophore_calycophoran_rocketship_young': array([ 98, 148, 159, 181, 183, 187]),
 'siphonophore_calycophoran_sphaeronectes': array([ 99, 152, 159, 181, 183, 187]),
 'siphonophore_calycophoran_sphaeronectes_stem': array([100, 152, 159, 181, 183, 187]),
 'siphonophore_calycophoran_sphaeronectes_young': array([101, 152, 159, 181, 183, 187]),
 'siphonophore_other_parts': array([102, 151, 174, 181, 183, 187]),
 'siphonophore_partial': array([103, 151, 174, 181, 183, 187]),
 'siphonophore_physonect': array([104, 142, 174, 181, 183, 187]),
 'siphonophore_physonect_young': array([105, 142, 174, 181, 183, 187]),
 'stomatopod': array([106, 126, 161, 176, 182, 187]),
 'tornaria_acorn_worm_larvae': array([107, 141, 169, 179, 185, 187]),
 'trichodesmium_bowtie': array([108, 156, 171, 180, 185, 187]),
 'trichodesmium_multiple': array([109, 156, 171, 180, 185, 187]),
 'trichodesmium_puff': array([110, 156, 171, 180, 185, 187]),
 'trichodesmium_tuft': array([111, 156, 171, 180, 185, 187]),
 'trochophore_larvae': array([112, 141, 169, 179, 185, 187]),
 'tunicate_doliolid': array([113, 157, 170, 177, 183, 187]),
 'tunicate_doliolid_nurse': array([114, 157, 170, 177, 183, 187]),
 'tunicate_partial': array([115, 157, 170, 177, 183, 187]),
 'tunicate_salp': array([116, 157, 170, 177, 183, 187]),
 'tunicate_salp_chains': array([117, 157, 170, 177, 183, 187]),
 'unknown_blobs_and_smudges': array([118, 158, 171, 180, 185, 187]),
 'unknown_sticks': array([119, 158, 171, 180, 185, 187]),
 'unknown_unclassified': array([120, 158, 171, 180, 185, 187])}

In [181]:
y = np.zeros((len(settings.classes),188))

In [182]:
for i,j in enumerate(map(lambda c: class_dictionary[c],settings.classes)):
    y[i,j] = 1
    print(i,j)


(0, array([  0, 121, 172, 180, 185, 187]))
(1, array([  1, 121, 172, 180, 185, 187]))
(2, array([  2, 121, 172, 180, 185, 187]))
(3, array([  3, 126, 161, 176, 182, 187]))
(4, array([  4, 122, 170, 177, 183, 187]))
(5, array([  5, 122, 170, 177, 183, 187]))
(6, array([  6, 122, 170, 177, 183, 187]))
(7, array([  7, 122, 170, 177, 183, 187]))
(8, array([  8, 138, 168, 178, 184, 186]))
(9, array([  9, 138, 168, 178, 184, 186]))
(10, array([ 10, 125, 171, 180, 185, 187]))
(11, array([ 11, 125, 171, 180, 185, 187]))
(12, array([ 12, 125, 171, 180, 185, 187]))
(13, array([ 13, 143, 171, 180, 185, 187]))
(14, array([ 14, 123, 160, 175, 182, 187]))
(15, array([ 15, 123, 160, 175, 182, 187]))
(16, array([ 16, 123, 160, 175, 182, 187]))
(17, array([ 17, 123, 160, 175, 182, 187]))
(18, array([ 18, 123, 160, 175, 182, 187]))
(19, array([ 19, 123, 160, 175, 182, 187]))
(20, array([ 20, 123, 160, 175, 182, 187]))
(21, array([ 21, 123, 160, 175, 182, 187]))
(22, array([ 22, 123, 160, 175, 182, 187]))
(23, array([ 23, 128, 163, 175, 182, 187]))
(24, array([ 24, 139, 163, 175, 182, 187]))
(25, array([ 25, 139, 163, 175, 182, 187]))
(26, array([ 26, 123, 160, 175, 182, 187]))
(27, array([ 27, 126, 161, 176, 182, 187]))
(28, array([ 28, 127, 162, 177, 183, 187]))
(29, array([ 29, 129, 162, 177, 183, 187]))
(30, array([ 30, 129, 162, 177, 183, 187]))
(31, array([ 31, 127, 162, 177, 183, 187]))
(32, array([ 32, 130, 173, 176, 182, 187]))
(33, array([ 33, 131, 171, 180, 185, 187]))
(34, array([ 34, 131, 171, 180, 185, 187]))
(35, array([ 35, 131, 171, 180, 185, 187]))
(36, array([ 36, 132, 171, 180, 185, 187]))
(37, array([ 37, 132, 171, 180, 185, 187]))
(38, array([ 38, 144, 164, 179, 185, 187]))
(39, array([ 39, 144, 164, 179, 185, 187]))
(40, array([ 40, 144, 164, 179, 185, 187]))
(41, array([ 41, 144, 164, 179, 185, 187]))
(42, array([ 42, 149, 164, 179, 185, 187]))
(43, array([ 43, 149, 164, 179, 185, 187]))
(44, array([ 44, 133, 164, 179, 185, 187]))
(45, array([ 45, 144, 164, 179, 185, 187]))
(46, array([ 46, 137, 166, 177, 183, 187]))
(47, array([ 47, 134, 173, 176, 182, 187]))
(48, array([ 48, 134, 173, 176, 182, 187]))
(49, array([ 49, 131, 171, 180, 185, 187]))
(50, array([ 50, 135, 171, 180, 185, 187]))
(51, array([ 51, 135, 171, 180, 185, 187]))
(52, array([ 52, 135, 171, 180, 185, 187]))
(53, array([ 53, 135, 171, 180, 185, 187]))
(54, array([ 54, 135, 171, 180, 185, 187]))
(55, array([ 55, 135, 171, 180, 185, 187]))
(56, array([ 56, 136, 165, 180, 185, 187]))
(57, array([ 57, 153, 167, 177, 183, 187]))
(58, array([ 58, 140, 167, 177, 183, 187]))
(59, array([ 59, 140, 167, 177, 183, 187]))
(60, array([ 60, 153, 167, 177, 183, 187]))
(61, array([ 61, 153, 167, 177, 183, 187]))
(62, array([ 62, 153, 167, 177, 183, 187]))
(63, array([ 63, 154, 167, 177, 183, 187]))
(64, array([ 64, 154, 167, 177, 183, 187]))
(65, array([ 65, 154, 167, 177, 183, 187]))
(66, array([ 66, 140, 167, 177, 183, 187]))
(67, array([ 67, 140, 167, 177, 183, 187]))
(68, array([ 68, 140, 167, 177, 183, 187]))
(69, array([ 69, 140, 167, 177, 183, 187]))
(70, array([ 70, 140, 167, 177, 183, 187]))
(71, array([ 71, 140, 167, 177, 183, 187]))
(72, array([ 72, 154, 167, 177, 183, 187]))
(73, array([ 73, 154, 167, 177, 183, 187]))
(74, array([ 74, 140, 167, 177, 183, 187]))
(75, array([ 75, 140, 167, 177, 183, 187]))
(76, array([ 76, 140, 167, 177, 183, 187]))
(77, array([ 77, 140, 167, 177, 183, 187]))
(78, array([ 78, 141, 169, 179, 185, 187]))
(79, array([ 79, 141, 169, 179, 185, 187]))
(80, array([ 80, 137, 166, 177, 183, 187]))
(81, array([ 81, 143, 171, 180, 185, 187]))
(82, array([ 82, 155, 172, 180, 185, 187]))
(83, array([ 83, 155, 172, 180, 185, 187]))
(84, array([ 84, 145, 172, 180, 185, 187]))
(85, array([ 85, 155, 172, 180, 185, 187]))
(86, array([ 86, 155, 172, 180, 185, 187]))
(87, array([ 87, 146, 165, 180, 185, 187]))
(88, array([ 88, 146, 165, 180, 185, 187]))
(89, array([ 89, 146, 165, 180, 185, 187]))
(90, array([ 90, 147, 172, 180, 185, 187]))
(91, array([ 91, 147, 172, 180, 185, 187]))
(92, array([ 92, 130, 173, 176, 182, 187]))
(93, array([ 93, 150, 173, 176, 182, 187]))
(94, array([ 94, 130, 173, 176, 182, 187]))
(95, array([ 95, 130, 173, 176, 182, 187]))
(96, array([ 96, 124, 159, 181, 183, 187]))
(97, array([ 97, 148, 159, 181, 183, 187]))
(98, array([ 98, 148, 159, 181, 183, 187]))
(99, array([ 99, 152, 159, 181, 183, 187]))
(100, array([100, 152, 159, 181, 183, 187]))
(101, array([101, 152, 159, 181, 183, 187]))
(102, array([102, 151, 174, 181, 183, 187]))
(103, array([103, 151, 174, 181, 183, 187]))
(104, array([104, 142, 174, 181, 183, 187]))
(105, array([105, 142, 174, 181, 183, 187]))
(106, array([106, 126, 161, 176, 182, 187]))
(107, array([107, 141, 169, 179, 185, 187]))
(108, array([108, 156, 171, 180, 185, 187]))
(109, array([109, 156, 171, 180, 185, 187]))
(110, array([110, 156, 171, 180, 185, 187]))
(111, array([111, 156, 171, 180, 185, 187]))
(112, array([112, 141, 169, 179, 185, 187]))
(113, array([113, 157, 170, 177, 183, 187]))
(114, array([114, 157, 170, 177, 183, 187]))
(115, array([115, 157, 170, 177, 183, 187]))
(116, array([116, 157, 170, 177, 183, 187]))
(117, array([117, 157, 170, 177, 183, 187]))
(118, array([118, 158, 171, 180, 185, 187]))
(119, array([119, 158, 171, 180, 185, 187]))
(120, array([120, 158, 171, 180, 185, 187]))

In [183]:
plt.imshow(y,cmap='Greys')


Out[183]:
<matplotlib.image.AxesImage at 0x7f731677fb10>

In [134]:
hier = neukrill_net.encoding.get_hierarchy()

In [135]:
class_dictionary2 = {}
for c in hier[0]:
    class_dictionary2[c] = np.where(np.array([a 
                           for l in neukrill_net.encoding.get_encoding(c,hier)
                           for a in l])==1)[0]

In [161]:
yb = np.zeros((len(settings.classes),188))

In [166]:
for i,j in enumerate(map(lambda c: class_dictionary2[c],settings.classes)):
    yb[i,j] = 1

In [167]:
plt.imshow(yb,cmap='Greys')


Out[167]:
<matplotlib.image.AxesImage at 0x7f731688ba50>

In [132]:
np.allclose(y[121:],yb[121:])


Out[132]:
True

In [184]:
for c in conflicted:
    print(c,class_dictionary[c],class_dictionary2[c])


(u'appendicularian_slight_curve', array([  5, 122, 170, 177, 183, 187]), array([  6, 122, 170, 177, 183, 187]))
(u'appendicularian_s_shape', array([  6, 122, 170, 177, 183, 187]), array([  5, 122, 170, 177, 183, 187]))
(u'hydromedusae_narcomedusae', array([ 64, 154, 167, 177, 183, 187]), array([ 65, 154, 167, 177, 183, 187]))
(u'hydromedusae_narco_young', array([ 65, 154, 167, 177, 183, 187]), array([ 64, 154, 167, 177, 183, 187]))
(u'shrimp_caridean', array([ 92, 130, 173, 176, 182, 187]), array([ 93, 130, 173, 176, 182, 187]))
(u'shrimp-like_other', array([ 93, 150, 173, 176, 182, 187]), array([ 92, 150, 173, 176, 182, 187]))

In [187]:
oldy = y[:]

Check for Heisenbugs:


In [189]:
for _ in range(100):
    y = np.zeros((len(settings.classes),188))
    for i,j in enumerate(map(lambda c: class_dictionary[c],settings.classes)):
        y[i,j] = 1
    if not np.allclose(y,oldy):
        print("Arrays do not match.")
    oldy = y[:]

In [213]:
reload(neukrill_net.encoding)


Out[213]:
<module 'neukrill_net.encoding' from '/afs/inf.ed.ac.uk/user/s08/s0805516/repos/neukrill-net-tools/neukrill_net/encoding.py'>

In [214]:
hierarchy = neukrill_net.encoding.get_hierarchy(settings)

In [215]:
for i,j in zip(settings.classes,hierarchy[0]):
    if i != j:
        print(i,j)

In [216]:
class_dictionary = neukrill_net.encoding.make_class_dictionary(settings.classes,hierarchy)

In [218]:
y = np.zeros((len(settings.classes),188))
for i,j in enumerate(map(lambda c: class_dictionary[c],hierarchy[0])):
    y[i,j] = 1
plt.imshow(y,cmap='Greys')


Out[218]:
<matplotlib.image.AxesImage at 0x7f731604c810>